Using Zero-Resource Spoken Term Discovery for Ranked Retrieval
نویسندگان
چکیده
Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.
منابع مشابه
Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval
Title of dissertation: Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval J. Scott Olsson, Doctor of Philosophy, 2008 Dissertation directed by: Associate Professor Douglas W. Oard College of Information Studies This dissertation considers the problem of information retrieval in speech. Today’s speech retrieval systems generally use a large vocab...
متن کاملZero-Resource Audio-Only Spoken Term Detection Based on a Combination of Template Matching Techniques
Spoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic ti...
متن کاملThe Zero Resource Speech Challenge 2015: Proposed Approaches and Results
This paper reports on the results of the Zero Resource Speech Challenge 2015, the first unified benchmark for zero resource speech technology, which aims at the unsupervised discovery of subword and word units from raw speech. This paper discusses the motivation for the challenge, its data sets, tasks and baseline systems. We outline the ideas behind the systems that were submitted for the two ...
متن کاملTowards spoken term discovery at scale with zero resources
The spoken term discovery task takes speech as input and identifies terms of possible interest. The challenge is to perform this task efficiently on large amounts of speech with zero resources (no training data and no dictionaries), where we must fall back to more basic properties of language. We find that long (∼ 1 s) repetitions tend to be contentful phrases (e.g. University of Pennsylvania) ...
متن کاملData-driven Posterior Features for Low Resource Speech Recognition Applications
In low resource settings, with very few hours of training data, state-of-the-art speech recognition systems that require large amounts of task specific training data perform very poorly. We address this issue by building data-driven speech recognition front-ends on significant amounts of task independent data from different languages and genres collected in similar acoustic conditions as the da...
متن کامل